Goto

Collaborating Authors

 vegetation index


EcoCast: A Spatio-Temporal Model for Continual Biodiversity and Climate Risk Forecasting

Akande, Hammed A., Gidado, Abdulrauf A.

arXiv.org Machine Learning

Increasing climate change and habitat loss are driving unprecedented shifts in species distributions. Conservation professionals urgently need timely, high-resolution predictions of biodiversity risks, especially in ecologically diverse regions like Africa. We propose EcoCast, a spatio-temporal model designed for continual biodiversity and climate risk forecasting. Utilizing multisource satellite imagery, climate data, and citizen science occurrence records, EcoCast predicts near-term (monthly to seasonal) shifts in species distributions through sequence-based transformers that model spatio-temporal environmental dependencies. The architecture is designed with support for continual learning to enable future operational deployment with new data streams. Our pilot study in Africa shows promising improvements in forecasting distributions of selected bird species compared to a Random Forest baseline, highlighting EcoCast's potential to inform targeted conservation policies. By demonstrating an end-to-end pipeline from multi-modal data ingestion to operational forecasting, EcoCast bridges the gap between cutting-edge machine learning and biodiversity management, ultimately guiding data-driven strategies for climate resilience and ecosystem conservation throughout Africa.


IberFire -- a detailed creation of a spatio-temporal dataset for wildfire risk assessment in Spain

Erzibengoa, Julen, Gómez-Omella, Meritxell, Goienetxea, Izaro

arXiv.org Artificial Intelligence

Wildfires pose a threat to ecosystems, economies and public safety, particularly in Mediterranean regions such as Spain. Accurate predictive models require high-resolution spatio-temporal data to capture complex dynamics of environmental and human factors. To address the scarcity of fine-grained wildfire datasets in Spain, we introduce IberFire: a spatio-temporal dataset with 1 km x 1 km x 1-day resolution, covering mainland Spain and the Balearic Islands from December 2007 to December 2024. IberFire integrates 120 features across eight categories: auxiliary data, fire history, geography, topography, meteorology, vegetation indices, human activity and land cover. All features and processing rely on open-access data and tools, with a publicly available codebase ensuring transparency and applicability. IberFire offers enhanced spatial granularity and feature diversity compared to existing European datasets, and provides a reproducible framework. It supports advanced wildfire risk modelling via Machine Learning and Deep Learning, facilitates climate trend analysis, and informs fire prevention and land management strategies. The dataset is freely available on Zenodo to promote open research and collaboration.


Forest tree species classification and entropy-derived uncertainty mapping using extreme gradient boosting and Sentinel-1/2 data

Abdi, Abdulhakim M., Wang, Fan

arXiv.org Machine Learning

We present a wall-to - wall map of dominant tree species in Swedish forests accompanied by pixel - level uncertainty estimates. The tree species classification is based on spatiotemporal metrics derived from Sentinel-1 and Sentinel - 2 satellite data, combined with field observations from the Swedish National Forest Inventory and auxiliary data on geomorphometry and canopy height. We apply an extreme gradient boosting model with Bayesian optimization to relate field observations to satellite-derived features and generate the final species map. Classification uncertainty is quantified using Shannon's entropy of the predicted class probabilities, which provide a spatially explicit measure of model confidence. The final model achieved an overall accuracy of 85% (F1 score = 0.82, Matthews correlation coefficient = 0.81), and mapped species distributions showed strong agreement with official forest statistics (r = 0.96). V ariable importance analysis revealed that the most influential predictors were optical bands from Sentinel - 2, particularly those acquired in spring and summer. This study provides scalable, interpretable, and policy-relevant method for tree species mapping with integrated uncertainty that are well-suited to meet emerging legislative and environmental goals.


Self-supervised Learning for Hyperspectral Images of Trees

Rahman, Moqsadur, Kumar, Saurav, Palmate, Santosh S., Hossain, M. Shahriar

arXiv.org Artificial Intelligence

Aerial remote sensing using multispectral and RGB imagers has provided a critical impetus to precision agriculture. Analysis of the hyperspectral images with limited or no labels is challenging. This paper focuses on self-supervised learning to create neural network embeddings reflecting vegetation properties of trees from aerial hyperspectral images of crop fields. Experimental results demonstrate that a constructed tree representation, using a vegetation property-related embedding space, performs better in downstream machine learning tasks compared to the direct use of hyperspectral vegetation properties as tree representations.


A Global Dataset of Location Data Integrity-Assessed Reforestation Efforts

John, Angela, Allotey, Selvyn, Koebe, Till, Tyukavina, Alexandra, Weber, Ingmar

arXiv.org Artificial Intelligence

Afforestation and reforestation are popular strategies for mitigating climate change by enhancing carbon sequestration. However, the effectiveness of these efforts is often self-reported by project developers, or certified through processes with limited external validation. This leads to concerns about data reliability and project integrity. In response to increasing scrutiny of voluntary carbon markets, this study presents a dataset on global afforestation and reforestation efforts compiled from primary (meta-)information and augmented with time-series satellite imagery and other secondary data. Our dataset covers 1,289,068 planting sites from 45,628 projects spanning 33 years. Since any remote sensing-based validation effort relies on the integrity of a planting site's geographic boundary, this dataset introduces a standardized assessment of the provided site-level location information, which we summarize in one easy-to-communicate key indicator: LDIS -- the Location Data Integrity Score. We find that approximately 79\% of the georeferenced planting sites monitored fail on at least 1 out of 10 LDIS indicators, while 15\% of the monitored projects lack machine-readable georeferenced data in the first place. In addition to enhancing accountability in the voluntary carbon market, the presented dataset also holds value as training data for e.g. computer vision-related tasks with millions of linked Sentinel-2 and Planetscope satellite images.


Sugar-Beet Stress Detection using Satellite Image Time Series

Sadbhave, Bhumika Laxman, Vaeth, Philipp, Dejon, Denise, Schorcht, Gunther, Gregorová, Magda

arXiv.org Artificial Intelligence

Satellite Image Time Series (SITS) data has proven effective for agricultural tasks due to its rich spectral and temporal nature. In this study, we tackle the task of stress detection in sugar-beet fields using a fully unsupervised approach. We propose a 3D convolutional au-toencoder model to extract meaningful features from Sentinel-2 image sequences, combined with acquisition-date-specific temporal encodings to better capture the growth dynamics of sugar-beets. The learned representations are used in a downstream clustering task to separate stressed from healthy fields. The resulting stress detection system can be directly applied to data from different years, offering a practical and accessible tool for stress detection in sugar-beets.


Time series classification of satellite data using LSTM networks: an approach for predicting leaf-fall to minimize railroad traffic disruption

de Wilde, Hein, Alsahag, Ali Mohammed Mansoor, Blanchet, Pierre

arXiv.org Artificial Intelligence

Railroad traffic disruption as a result of leaf-fall cost the UK rail industry over 300 million per year and measures to mitigate such disruptions are employed on a large scale, with 1.67 million kilometers of track being treated in the UK in 2021 alone. Therefore, the ability to anticipate the timing of leaf-fall would offer substantial benefits for rail network operators, enabling the efficient scheduling of such mitigation measures. However, current methodologies for predicting leaf-fall exhibit considerable limitations in terms of scalability and reliability. This study endeavors to devise a prediction system that leverages specialized prediction methods and the latest satellite data sources to generate both scalable and reliable insights into leaf-fall timings. An LSTM network trained on ground-truth leaf-falling data combined with multispectral and meteorological satellite data demonstrated a root-mean-square error of 6.32 days for predicting the start of leaf-fall and 9.31 days for predicting the end of leaf-fall. The model, which improves upon previous work on the topic, offers promising opportunities for the optimization of leaf mitigation measures in the railway industry and the improvement of our understanding of complex ecological systems.


Leveraging Novel Ensemble Learning Techniques and Landsat Multispectral Data for Estimating Olive Yields in Tunisia

Kefi, Mohamed, Pham, Tien Dat, Nguyen, Thin, Tjoelker, Mark G., Devasirvatham, Viola, Kashiwagi, Kenichi

arXiv.org Artificial Intelligence

Olive production is an important tree crop in Mediterranean climates. However, olive yield varies significantly due to climate change. Accurately estimating yield using remote sensing and machine learning remains a complex challenge. In this study, we developed a streamlined pipeline for olive yield estimation in the Kairouan and Sousse governorates of Tunisia. We extracted features from multispectral reflectance bands, vegetation indices derived from Landsat-8 OLI and Landsat-9 OLI-2 satellite imagery, along with digital elevation model data. These spatial features were combined with ground-based field survey data to form a structured tabular dataset. We then developed an automated ensemble learning framework, implemented using AutoGluon to train and evaluate multiple machine learning models, select optimal combinations through stacking, and generate robust yield predictions using five-fold cross-validation. The results demonstrate strong predictive performance from both sensors, with Landsat-8 OLI achieving R2 = 0.8635 and RMSE = 1.17 tons per ha, and Landsat-9 OLI-2 achieving R2 = 0.8378 and RMSE = 1.32 tons per ha. This study highlights a scalable, cost-effective, and accurate method for olive yield estimation, with potential applicability across diverse agricultural regions globally.


Multi-modal Data Fusion and Deep Ensemble Learning for Accurate Crop Yield Prediction

Yewle, Akshay Dagadu, Mirzayeva, Laman, Karakuş, Oktay

arXiv.org Artificial Intelligence

This study introduces RicEns-Net, a novel Deep Ensemble model designed to predict crop yields by integrating diverse data sources through multimodal data fusion techniques. The research focuses specifically on the use of synthetic aperture radar (SAR), optical remote sensing data from Sentinel 1, 2, and 3 satellites, and meteorological measurements such as surface temperature and rainfall. The initial field data for the study were acquired through Ernst & Young's (EY) Open Science Challenge 2023. The primary objective is to enhance the precision of crop yield prediction by developing a machine-learning framework capable of handling complex environmental data. A comprehensive data engineering process was employed to select the most informative features from over 100 potential predictors, reducing the set to 15 features from 5 distinct modalities. This step mitigates the ``curse of dimensionality" and enhances model performance. The RicEns-Net architecture combines multiple machine learning algorithms in a deep ensemble framework, integrating the strengths of each technique to improve predictive accuracy. Experimental results demonstrate that RicEns-Net achieves a mean absolute error (MAE) of 341 kg/Ha (roughly corresponds to 5-6\% of the lowest average yield in the region), significantly exceeding the performance of previous state-of-the-art models, including those developed during the EY challenge.


Integrating remote sensing data assimilation, deep learning and large language model for interactive wheat breeding yield prediction

Yang, Guofeng, Jin, Nanfei, Ai, Wenjie, Zheng, Zhonghua, He, Yuhong, He, Yong

arXiv.org Artificial Intelligence

Yield is one of the core goals of crop breeding. By predicting the potential yield of different breeding materials, breeders can screen these materials at various growth stages to select the best performing. Based on unmanned aerial vehicle remote sensing technology, high-throughput crop phenotyping data in breeding areas is collected to provide data support for the breeding decisions of breeders. However, the accuracy of current yield predictions still requires improvement, and the usability and user-friendliness of yield forecasting tools remain suboptimal. To address these challenges, this study introduces a hybrid method and tool for crop yield prediction, designed to allow breeders to interactively and accurately predict wheat yield by chatting with a large language model (LLM). First, the newly designed data assimilation algorithm is used to assimilate the leaf area index into the WOFOST model. Then, selected outputs from the assimilation process, along with remote sensing inversion results, are used to drive the time-series temporal fusion transformer model for wheat yield prediction. Finally, based on this hybrid method and leveraging an LLM with retrieval augmented generation technology, we developed an interactive yield prediction Web tool that is user-friendly and supports sustainable data updates. This tool integrates multi-source data to assist breeding decision-making. This study aims to accelerate the identification of high-yield materials in the breeding process, enhance breeding efficiency, and enable more scientific and smart breeding decisions.